Goto

Collaborating Authors

 cell cycle


MODE: Learning compositional representations of complex systems with Mixtures Of Dynamical Experts

Quiblier, Nathan, Friedman, Roy, Ricci, Matthew

arXiv.org Artificial Intelligence

Dynamical systems in the life sciences are often composed of complex mixtures of overlapping behavioral regimes. Cellular subpopulations may shift from cycling to equilibrium dynamics or branch towards different developmental fates. The transitions between these regimes can appear noisy and irregular, posing a serious challenge to traditional, flow-based modeling techniques which assume locally smooth dynamics. To address this challenge, we propose MODE (Mixture Of Dynamical Experts), a graphical modeling framework whose neural gating mechanism decomposes complex dynamics into sparse, interpretable components, enabling both the unsupervised discovery of behavioral regimes and accurate long-term forecasting across regime transitions. Crucially, because agents in our framework can jump to different governing laws, MODE is especially tailored to the aforementioned noisy transitions. We evaluate our method on a battery of synthetic and real datasets from computational biology. First, we systematically benchmark MODE on an unsupervised classification task using synthetic dynamical snapshot data, including in noisy, few-sample settings. Next, we show how MODE succeeds on challenging forecasting tasks which simulate key cycling and branching processes in cell biology. Finally, we deploy our method on human, single-cell RNA sequencing data and show that it can not only distinguish proliferation from differentiation dynamics but also predict when cells will commit to their ultimate fate, a key outstanding challenge in computational biology.


Building multiscale models with PhysiBoSS, an agent-based modeling tool

Ruscone, Marco, Checcoli, Andrea, Heiland, Randy, Barillot, Emmanuel, Macklin, Paul, Calzone, Laurence, Noël, Vincent

arXiv.org Artificial Intelligence

Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of genetic or environmental deregulation observed in complex diseases, describe the interplay between a pathological tissue and the immune system, and suggest strategies to revert the diseased phenotypes. The construction of these multiscale models remains a very complex task, including the choice of the components to consider, the level of details of the processes to simulate, or the fitting of the parameters to the data. One additional difficulty is the expert knowledge needed to program these models in languages such as C++ or Python, which may discourage the participation of non-experts. Simplifying this process through structured description formalisms -- coupled with a graphical interface -- is crucial in making modeling more accessible to the broader scientific community, as well as streamlining the process for advanced users. This article introduces three examples of multiscale models which rely on the framework PhysiBoSS, an add-on of PhysiCell that includes intracellular descriptions as continuous time Boolean models to the agent-based approach. The article demonstrates how to easily construct such models, relying on PhysiCell Studio, the PhysiCell Graphical User Interface. A step-by-step tutorial is provided as a Supplementary Material and all models are provided at: https://physiboss.github.io/tutorial/.


Encoding Domain Information with Sparse Priors for Inferring Explainable Latent Variables

Qoku, Arber, Buettner, Florian

arXiv.org Machine Learning

Latent variable models are powerful statistical tools that can uncover relevant variation between patients or cells, by inferring unobserved hidden states from observable high-dimensional data. A major shortcoming of current methods, however, is their inability to learn sparse and interpretable hidden states. Additionally, in settings where partial knowledge on the latent structure of the data is readily available, a statistically sound integration of prior information into current methods is challenging. To address these issues, we propose spex-LVM, a factorial latent variable model with sparse priors to encourage the inference of explainable factors driven by domain-relevant information. spex-LVM utilizes existing knowledge of curated biomedical pathways to automatically assign annotated attributes to latent factors, yielding interpretable results tailored to the corresponding domain of interest. Evaluations on simulated and real single-cell RNA-seq datasets demonstrate that our model robustly identifies relevant structure in an inherently explainable manner, distinguishes technical noise from sources of biomedical variation, and provides dataset-specific adaptations of existing pathway annotations. Implementation is available at https://github.com/MLO-lab/spexlvm.


CellCycleTRACER accounts for cell cycle and volume in mass cytometry data

#artificialintelligence

Single-cell analysis technologies are rapidly improving and will eventually match the performance of their population-level counterparts. RNA transcriptomes can be quantified in thousands of single cells, and analyses of transcriptomes of single cells with spatial resolution in tissues have been reported1,2,3. Mass cytometry has the potential to enable simultaneous detection of up to 50 proteins, protein modifications, such as phosphorylation, and transcripts in single cells4,5,6,7. Recent developments enable highly multiplexed imaging of similar numbers of markers in adherent cells and tissues5,8,9,10. Single-cell data are typically used to identify cell subpopulations that share similar transcript or protein expression or functional markers.


Modular Action Language ALM

Inclezan, Daniela, Gelfond, Michael

arXiv.org Artificial Intelligence

The paper introduces a new modular action language, ALM, and illustrates the methodology of its use. It is based on the approach of Gelfond and Lifschitz (1993; 1998) in which a high-level action language is used as a front end for a logic programming system description. The resulting logic programming representation is used to perform various computational tasks. The methodology based on existing action languages works well for small and even medium size systems, but is not meant to deal with larger systems that require structuring of knowledge. ALM is meant to remedy this problem. Structuring of knowledge in ALM is supported by the concepts of module (a formal description of a specific piece of knowledge packaged as a unit), module hierarchy, and library, and by the division of a system description of ALM into two parts: theory and structure. A theory consists of one or more modules with a common theme, possibly organized into a module hierarchy based on a dependency relation. It contains declarations of sorts, attributes, and properties of the domain together with axioms describing them. Structures are used to describe the domain's objects. These features, together with the means for defining classes of a domain as special cases of previously defined ones, facilitate the stepwise development, testing, and readability of a knowledge base, as well as the creation of knowledge representation libraries. To appear in Theory and Practice of Logic Programming (TPLP).


A Bayesian approach for structure learning in oscillating regulatory networks

Trejo, D, Millar, AJ, Sanguinetti, G

arXiv.org Machine Learning

Oscillations lie at the core of many biological processes, from the cell cycle, to circadian oscillations and developmental processes. Time-keeping mechanisms are essential to enable organisms to adapt to varying conditions in environmental cycles, from day/night to seasonal. Transcriptional regulatory networks are one of the mechanisms behind these biological oscillations. However, while identifying cyclically expressed genes from time series measurements is relatively easy, determining the structure of the interaction network underpinning the oscillation is a far more challenging problem. Here, we explicitly leverage the oscillatory nature of the transcriptional signals and present a method for reconstructing network interactions tailored to this special but important class of genetic circuits. Our method is based on projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform. We build a Bayesian Hierarchical model within a frequency domain linear model in order to enforce sparsity and incorporate prior knowledge about the network structure. Experiments on real and simulated data show that the method can lead to substantial improvements over competing approaches if the oscillatory assumption is met, and remains competitive also in cases it is not.


Representing Biological Processes in Modular Action Language ALM

Inclezan, Daniela (Texas Tech University) | Gelfond, Michael (Texas Tech University)

AAAI Conferences

This paper presents the formalization of a biological process, cell division, in modular action language ALM. We show how the features of ALM — modularity, separation between an uninterpreted theory and its interpretation — lead to a simple and elegant solution that can be used in answering questions from biology textbooks.


Time-Varying Dynamic Bayesian Networks

Song, Le, Kolar, Mladen, Xing, Eric P.

Neural Information Processing Systems

Directed graphical models such as Bayesian networks are a favored formalism to model the dependency structures in complex multivariate systems such as those encountered in biology and neural sciences. When the system is undergoing dynamic transformation, often a temporally rewiring network is needed for capturing the dynamic causal influences between covariates. In this paper, we propose a time-varying dynamic Bayesian network (TV-DBN) for modeling the structurally varying directed dependency structures underlying non-stationary biological/neural time series. This is a challenging problem due the non-stationarity and sample scarcity of the time series. We present a kernel reweighted $\ell_1$ regularized auto-regressive procedure for learning the TV-DBN model. Our method enjoys nice properties such as computational efficiency and provable asymptotic consistency. Applying TV-DBN to time series measurements during yeast cell cycle and brain response to visual stimuli reveals interesting dynamics underlying the respective biological systems.


Articulation and Clarification of the Dendritic Cell Algorithm

Greensmith, Julie, Aickelin, Uwe, Twycross, Jamie

arXiv.org Artificial Intelligence

The Dendritic Cell algorithm (DCA) is inspired by recent work in innate immunity. In this paper a formal description of the DCA is given. The DCA is described in detail, and its use as an anomaly detector is illustrated within the context of computer security. A port scan detection task is performed to substantiate the influence of signal selection on the behaviour of the algorithm. Experimental results provide a comparison of differing input signal mappings.


Gene Expression Clustering with Functional Mixture Models

Chudova, Darya, Hart, Christopher, Mjolsness, Eric, Smyth, Padhraic

Neural Information Processing Systems

We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures.